几乎没有弹出的文本分类旨在在几个弹奏方案下对文本进行分类。以前的大多数方法都采用基于优化的元学习来获得任务分布。但是,由于少数样本和复杂模型之间的匹配以及有用的任务功能之间的区别,这些方法遭受了过度拟合问题的影响。为了解决这个问题,我们通过梯度相似性(AMGS)方法提出了一种新颖的自适应元学习器,以提高模型的泛化能力。具体而言,拟议的AMG基于两个方面缓解了过度拟合:(i)通过内部循环中的自我监督的辅助任务来获取样品的潜在语义表示并改善模型的概括,(ii)利用适应性元学习者通过适应性元学习者通过梯度通过相似性,可以在外环中基底学习者获得的梯度上增加约束。此外,我们对正则化对整个框架的影响进行系统分析。对几个基准测试的实验结果表明,与最先进的优化元学习方法相比,提出的AMG始终提高了很少的文本分类性能。
translated by 谷歌翻译
原始收集的培训数据通常带有从多个不完美的注释器中收集的单独的嘈杂标签(例如,通过众包)。通常,首先将单独的嘈杂标签汇总为一个,并应用标准培训方法。文献还广泛研究了有效的聚合方法。本文重新审视了此选择,并旨在为一个问题提供一个答案,即是否应该将单独的嘈杂标签汇总为单个单个标签或单独使用它们作为给定标签。我们从理论上分析了许多流行损失功能的经验风险最小化框架下的两种方法的性能,包括专门为使用嘈杂标签学习的问题而设计的损失功能。我们的定理得出的结论是,当噪声速率较高时,标签分离优于标签聚集,或者标记器/注释的数量不足。广泛的经验结果证明了我们的结论。
translated by 谷歌翻译
在这项工作中,我们在分配强化学习方面建立了最新的进步,以基于IQN提供模型的最新分配变体。我们通过使用GAN模型的生成器和鉴别器功能与分位数回归来实现这一目标,从而近似于状态返回分布的完整分位数。我们证明了基线数据集的性能提高-57 Atari 2600游戏。此外,我们使用算法来显示Atari游戏中风险敏感政策的最新培训表现,并通过政策优化和评估。
translated by 谷歌翻译
当前的最佳性能模型用于知识图推理(KGR)将几何学对象或概率分布引入嵌入实体,并将一阶逻辑(fol)查询引入低维矢量空间。它们可以总结为中心尺寸框架(点/框/锥,β/高斯分布等)。但是,它们具有有限的逻辑推理能力。而且很难概括到各种功能,因为中心和大小是一对一的约束,无法具有多个中心或尺寸。为了应对这些挑战,我们相反提出了一个名为“特征逻辑嵌入框架Flex”的新颖的KGR框架,这是第一个KGR框架,它不仅可以真正处理所有运营,包括连词,析取,否定,否定等等,而且还支持各种操作特征空间。具体而言,特征逻辑框架的逻辑部分是基于向量逻辑的,它自然地对所有FOL操作进行了建模。实验表明,FLEX在基准数据集上明显优于现有的最新方法。
translated by 谷歌翻译
半监督学习(SSL)证明了其在高质量监督数据受到严重限制时提高各种学习任务的模型准确性的潜力。尽管经常确定,整个数据群的平均准确性得到了改善,但尚不清楚SSL如何具有不同的子人群的票价。当我们旨在公平对待的人口群体定义不同的子人群时,了解上述问题具有很大的公平意义。在本文中,我们揭示了部署SSL的不同影响:在不使用SSL(“ Rich” One)的情况下具有较高基线准确性的子人群倾向于从SSL中受益更多;尽管添加SSL模块后,遭受低基线准确性(“穷”)的子人群甚至可能会观察到性能下降。我们从理论上和经验上为广泛的SSL算法建立上述观察结果,该算法是明确或隐式使用辅助“伪标签”。一组图像和文本分类任务的实验证实了我们的主张。我们介绍了一个新的度量,收益比,并促进对SSL公平性(均等福利比)的评估。我们进一步讨论如何减轻不同的影响。我们希望我们的论文能够震惊使用SSL的潜在陷阱,并鼓励对未来SSL算法进行多方面评估。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.
translated by 谷歌翻译
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译